Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 14.148
Filtrar
1.
Nat Commun ; 15(1): 3126, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605047

RESUMO

Long reads that cover more variants per read raise opportunities for accurate haplotype construction, whereas the genotype errors of single nucleotide polymorphisms pose great computational challenges for haplotyping tools. Here we introduce KSNP, an efficient haplotype construction tool based on the de Bruijn graph (DBG). KSNP leverages the ability of DBG in handling high-throughput erroneous reads to tackle the challenges. Compared to other notable tools in this field, KSNP achieves at least 5-fold speedup while producing comparable haplotype results. The time required for assembling human haplotypes is reduced to nearly the data-in time.


Assuntos
Algoritmos , Polimorfismo de Nucleotídeo Único , Humanos , Haplótipos/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
2.
Biochemistry (Mosc) ; 89(Suppl 1): S234-S248, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38621753

RESUMO

This review highlights operational principles, features, and modern aspects of the development of third-generation sequencing technology of biopolymers focusing on the nucleic acids analysis, namely the nanopore sequencing system. Basics of the method and technical solutions used for its realization are considered, from the first works showing the possibility of creation of these systems to the easy-to-handle procedure developed by Oxford Nanopore Technologies company. Moreover, this review focuses on applications, which were developed and realized using equipment developed by the Oxford Nanopore Technologies, including assembly of whole genomes, methagenomics, direct analysis of the presence of modified bases.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Sequência de DNA/métodos , Biopolímeros , Sequenciamento de Nucleotídeos em Larga Escala/métodos
3.
BMC Pediatr ; 24(1): 230, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38561707

RESUMO

BACKGROUND: Newborn screening (NBS), such as tandem mass spectrometry (MS/MS), may yield false positive/negative results. Next-generation sequencing (NGS) has the potential to provide increased data output, efficiencies, and applications. This study aimed to analyze the types and distribution of pathogenic gene mutations in newborns in Huzhou, Zhejiang province, China and explore the applicability of NGS and MS/MS in NBS. METHODS: Blood spot samples from 1263 newborns were collected. NGS was employed to screen for pathogenic variants in 542 disease-causing genes, and detected variants were validated using Sanger sequencing. Simultaneously, 26 inherited metabolic diseases (IMD) were screened using MS/MS. Positive or suspicious samples identified through MS/MS were cross-referenced with the results of NGS. RESULTS: Among all newborns, 328 had no gene mutations detected. NGS revealed at least one gene mutation in 935 newborns, with a mutation rate of 74.0%. The top 5 genes were FLG, GJB2, UGT1A1, USH2A, and DUOX2. According to American College of Medical Genetics guidelines, gene mutations in 260 cases were classified as pathogenic or likely pathogenic mutation, with a positive rate of 20.6%. The top 5 genes were UGT1A1, FLG, GJB2, MEFV, and G6PD. MS/MS identified 18 positive or suspicious samples for IMD and 1245 negative samples. Verification of these cases by NGS results showed no pathogenic mutations, resulting in a false positive rate of 1.4% (18/1263). CONCLUSION: NBS using NGS technology broadened the range of diseases screened, and enhanced the accuracy of diagnoses in comparison to MS/MS for screening IMD. Combining NGS and biochemical screening would improve the efficiency of current NBS.


Assuntos
Doenças Metabólicas , Triagem Neonatal , Recém-Nascido , Humanos , Triagem Neonatal/métodos , Espectrometria de Massas em Tandem , Doenças Metabólicas/diagnóstico , Mutação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pirina/genética
4.
Sci Rep ; 14(1): 8159, 2024 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589623

RESUMO

Whole-genome sequencing (WGS) is currently making its transition from research tool into routine (clinical) diagnostic practice. The workflow for WGS includes the highly labor-intensive library preparations (LP), one of the most critical steps in the WGS procedure. Here, we describe the automation of the LP on the flowbot ONE robot to minimize the risk of human error and reduce hands-on time (HOT). For this, the robot was equipped, programmed, and optimized to perform the Illumina DNA Prep automatically. Results obtained from 16 LP that were performed both manually and automatically showed comparable library DNA yields (median of 1.5-fold difference), similar assembly quality values, and 100% concordance on the final core genome multilocus sequence typing results. In addition, reproducibility of results was confirmed by re-processing eight of the 16 LPs using the automated workflow. With the automated workflow, the HOT was reduced to 25 min compared to the 125 min needed when performing eight LPs using the manual workflow. The turn-around time was 170 and 200 min for the automated and manual workflow, respectively. In summary, the automated workflow on the flowbot ONE generates consistent results in terms of reliability and reproducibility, while significantly reducing HOT as compared to manual LP.


Assuntos
Lipopolissacarídeos , Robótica , Humanos , Reprodutibilidade dos Testes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Gênica , Sequenciamento Completo do Genoma , DNA , Fluxo de Trabalho
5.
Genome Biol ; 25(1): 91, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589937

RESUMO

BACKGROUND: Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS: Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS: Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.


Assuntos
Algoritmos , Benchmarking , Humanos , Genótipo , Genômica/métodos , Técnicas de Genotipagem/métodos , Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
6.
Cancer Med ; 13(7): e7162, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38572952

RESUMO

PURPOSE: Genetic mutation detection has become an important step in nonsmall-cell lung cancer (NSCLC) treatment because of the increasing number of drugs that target genomic rearrangements. A multiplex test that can detect multiple gene mutations prior to treatment is thus necessary. Currently, either next-generation sequencing (NGS)-based or polymerase chain reaction (PCR)-based tests are used. We evaluated the performance of the Oncomine Dx Target Test (ODxTT), an NGS-based multiplex biomarker panel test, and the AmoyDx Pan Lung Cancer PCR Panel (AmoyDx PLC panel), a real-time PCR-based multiplex biomarker panel test. MATERIALS AND METHODS: Patients with histologically diagnosed NSCLC and a sufficient sample volume to simultaneously perform the AmoyDx PLC panel and ODxTT-M were included in the study. The success and detection rates of both tests were evaluated. RESULTS: Biopsies revealed 116 cases of malignancies, 100 of which were NSCLC. Of these, 59 met the inclusion criteria and were eligible for analysis. The success rates were 100% and 98% for AmoyDx PLC panel and ODxTT-M, respectively. Nine driver mutations were detected in 35.9% and 37.3% of AmoyDx PLC and ODxTT-M panels, respectively. EGFR mutations were detected in 14% and 12% of samples using the AmoyDx PLC panel and ODxTT-M, respectively. Of the 58 cases in which both NGS and AmoyDx PLC panels were successful, discordant results were observed in seven cases. These differences were mainly due to different sensitivities of the detection methods used and the gene variants targeted in each test. DISCUSSION: The AmoyDx PLC panel, a PCR-based multiplex diagnostic test, exhibits a high success rate. The frequency of the nine genes targeted for treatment detected by the AmoyDx PLC panel was comparable to the frequency of mutations detected by ODxTT-M. Clinicians should understand and use the AmoyDx PLC panel and ODxTT-M with respect to their respective performances and limitations.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/tratamento farmacológico , Reação em Cadeia da Polimerase Multiplex , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Mutação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biomarcadores
7.
Sci Rep ; 14(1): 7988, 2024 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580715

RESUMO

In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA/genética , Genoma Humano
8.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38565260

RESUMO

MOTIVATION: Automated chromatin segmentation based on ChIP-seq (chromatin immunoprecipitation followed by sequencing) data reveals insights into the epigenetic regulation of chromatin accessibility. Existing segmentation methods are constrained by simplifying modeling assumptions, which may have a negative impact on the segmentation quality. RESULTS: We introduce EpiSegMix, a novel segmentation method based on a hidden Markov model with flexible read count distribution types and state duration modeling, allowing for a more flexible modeling of both histone signals and segment lengths. In a comparison with existing tools, ChromHMM, Segway, and EpiCSeg, we show that EpiSegMix is more predictive of cell biology, such as gene expression. Its flexible framework enables it to fit an accurate probabilistic model, which has the potential to increase the biological interpretability of chromatin states. AVAILABILITY AND IMPLEMENTATION: Source code: https://gitlab.com/rahmannlab/episegmix.


Assuntos
Cromatina , Epigênese Genética , Análise de Sequência de DNA/métodos , Histonas/metabolismo , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
9.
Genome Biol ; 25(1): 90, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589969

RESUMO

Single-cell ATAC-seq has emerged as a powerful approach for revealing candidate cis-regulatory elements genome-wide at cell-type resolution. However, current single-cell methods suffer from limited throughput and high costs. Here, we present a novel technique called scifi-ATAC-seq, single-cell combinatorial fluidic indexing ATAC-sequencing, which combines a barcoded Tn5 pre-indexing step with droplet-based single-cell ATAC-seq using the 10X Genomics platform. With scifi-ATAC-seq, up to 200,000 nuclei across multiple samples can be indexed in a single emulsion reaction, representing an approximately 20-fold increase in throughput compared to the standard 10X Genomics workflow.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Núcleo Celular
10.
BMC Cancer ; 24(1): 489, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38632507

RESUMO

BACKGROUND: Next-generation sequencing (NGS) is essential for lung cancer treatment. It is important to collect sufficient tissue specimens, but sometimes we cannot obtain large enough samples for NGS analysis. We investigated the yield of NGS analysis by frozen cytology pellets using an Oncomine Comprehensive Assay or Oncomine Precision Assay. METHODS: We retrospectively enrolled patients with lung cancer who underwent bronchoscopy at Kobe University Hospital and were enrolled in the Lung Cancer Genomic Screening Project for Individualized Medicine. We investigated the amount of extracted DNA and RNA and determined the NGS success rates. We also compared the amount of DNA and RNA by bronchoscopy methods. To create the frozen cytology pellets, we first effectively collected the cells and then quickly centrifuged and cryopreserved them. RESULTS: A total of 132 patients were enrolled in this study between May 2016 and December 2022; of them, 75 were subjected to frozen cytology pellet examinations and 57 were subjected to frozen tissue examinations. The amount of DNA and RNA obtained by frozen cytology pellets was nearly equivalent to frozen tissues. Frozen cytology pellets collected by endobronchial ultrasound-guided transbronchial needle aspiration yielded significantly more DNA than those collected by transbronchial biopsy methods. (P < 0.01) In RNA content, cytology pellets were not inferior to frozen tissue. The success rate of NGS analysis with frozen cytology pellet specimens was comparable to the success rate of NGS analysis with frozen tissue specimens. CONCLUSIONS: Our study showed that frozen cytology pellets may have equivalent diagnostic value to frozen tissue for NGS analyses. Bronchial cytology specimens are usually used only for cytology, but NGS analysis is possible if enough cells are collected to create pellet specimens. In particular, the frozen cytology pellets obtained by endobronchial ultrasound-guided transbronchial needle aspiration yielded sufficient amounts of DNA. TRIAL REGISTRATION: This was registered with the University Medical Hospital Information Network in Japan (UMINCTR registration no. UMIN000052050).


Assuntos
Neoplasias Pulmonares , Humanos , Estudos Retrospectivos , Neoplasias Pulmonares/patologia , Aspiração por Agulha Fina Guiada por Ultrassom Endoscópico/métodos , Broncoscopia/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA , RNA , Linfonodos/patologia
11.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38632951

RESUMO

In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable's superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Genoma , Algoritmos
12.
Sci Rep ; 14(1): 9000, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38637641

RESUMO

Long-read genome sequencing (lrGS) is a promising method in genetic diagnostics. Here we investigate the potential of lrGS to detect a disease-associated chromosomal translocation between 17p13 and the 19 centromere. We constructed two sets of phased and non-phased de novo assemblies; (i) based on lrGS only and (ii) hybrid assemblies combining lrGS with optical mapping using lrGS reads with a median coverage of 34X. Variant calling detected both structural variants (SVs) and small variants and the accuracy of the small variant calling was compared with those called with short-read genome sequencing (srGS). The de novo and hybrid assemblies had high quality and contiguity with N50 of 62.85 Mb, enabling a near telomere to telomere assembly with less than a 100 contigs per haplotype. Notably, we successfully identified the centromeric breakpoint of the translocation. A concordance of 92% was observed when comparing small variant calling between srGS and lrGS. In summary, our findings underscore the remarkable potential of lrGS as a comprehensive and accurate solution for the analysis of SVs and small variants. Thus, lrGS could replace a large battery of genetic tests that were used for the diagnosis of a single symptomatic translocation carrier, highlighting the potential of lrGS in the realm of digital karyotyping.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Translocação Genética , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequência de Bases , Centrômero/genética
13.
Front Cell Infect Microbiol ; 14: 1329235, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38638828

RESUMO

The metagenomic next-generation sequencing (mNGS) method is preferred for genotyping useful for the identification of organisms, illumination of metabolic pathways, and determination of microbiota. It can accurately obtain all the nucleic acid information in the test sample. Anthrax is one of the most important zoonotic diseases, infecting mainly herbivores and occasionally humans. The disease has four typical clinical forms, cutaneous, gastrointestinal, inhalation, and injection, all of which may result in sepsis or meningitis, with cutaneous being the most common form. Here, we report a case of cutaneous anthrax diagnosed by mNGS in a butcher. Histopathology of a skin biopsy revealed PAS-positive bacilli. Formalin-fixed paraffin-embedded (FFPE) tissue sample was confirmed the diagnosis of anthrax by mNGS. He was cured with intravenous penicillin. To our knowledge, this is the first case of cutaneous anthrax diagnosed by mNGS using FFPE tissue. mNGS is useful for identifying pathogens that are difficult to diagnose with conventional methods, and FFPE samples are simple to manage. Compared with traditional bacterial culture, which is difficult to cultivate and takes a long time, mNGS can quickly and accurately help us diagnose anthrax, so that anthrax can be controlled in a timely manner and prevent the outbreak of epidemic events.


Assuntos
Antraz , Dermatopatias Bacterianas , Masculino , Humanos , Antraz/diagnóstico , Inclusão em Parafina , Formaldeído/uso terapêutico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Sensibilidade e Especificidade
14.
Genome Biol ; 25(1): 101, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38641647

RESUMO

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.


Assuntos
Genoma , Genômica , Genômica/métodos , Biologia Computacional , Mutação INDEL , Viés , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
15.
Pediatr Int ; 66(1): e15760, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38641939

RESUMO

Diseases are caused by genetic and/or environmental factors. It is important to understand the pathomechanism of monogenic diseases that are caused only by genetic factors, especially prenatal- or childhood-onset diseases for pediatricians. Identifying "novel" disease genes and elucidating how genomic changes lead to human phenotypes would develop new therapeutic approaches for rare diseases for which no fundamental cure has yet been established. Genomic analysis has evolved along with the development of analytical techniques, from Sanger sequencing (first-generation sequencing) to techniques such as comparative genomic hybridization, massive parallel short-read sequencing (using a next-generation sequencer or second-generation sequencer) and long-read sequencing (using a next-next generation sequencer or third-generation sequencer). I have been researching human genetics using conventional and new technologies, together with my mentors and numerous collaborators, and have identified genes responsible for more than 60 diseases. Here, an overview of genomic analyses of monogenic diseases that aims to identify novel disease genes, and several examples using different approaches depending on the disease characteristics are presented.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Criança , Hibridização Genômica Comparativa , Fenótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos
16.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38569896

RESUMO

MOTIVATION: Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. RESULTS: Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or nonunique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues. AVAILABILITY AND IMPLEMENTATION: Pacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python, and bash for Linux. It runs on GNU/Linux HPC clusters via Slurm, PBS, or GridEngine schedulers. A single-machine simplex version is also available.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Gênica , Genótipo , Análise por Conglomerados
17.
BMC Genomics ; 25(1): 365, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622536

RESUMO

BACKGROUND: Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS: Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS: Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.


Assuntos
Genoma Bacteriano , Pseudogenes , Pseudogenes/genética , Mapeamento Cromossômico , Sequência de Bases , Genoma Microbiano , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
18.
Virol J ; 21(1): 86, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622686

RESUMO

BACKGROUND: Viruses have notable effects on agroecosystems, wherein they can adversely affect plant health and cause problems (e.g., increased biosecurity risks and economic losses). However, our knowledge of their diversity and interactions with specific host plants in ecosystems remains limited. To enhance our understanding of the roles that viruses play in agroecosystems, comprehensive analyses of the viromes of a wide range of plants are essential. High-throughput sequencing (HTS) techniques are useful for conducting impartial and unbiased investigations of plant viromes, ultimately forming a basis for generating further biological and ecological insights. This study was conducted to thoroughly characterize the viral community dynamics in individual plants. RESULTS: An HTS-based virome analysis in conjunction with proximity sampling and a tripartite network analysis were performed to investigate the viral diversity in chunkung (Cnidium officinale) plants. We identified 61 distinct chunkung plant-associated viruses (27 DNA and 34 RNA viruses) from 21 known genera and 6 unclassified genera in 14 known viral families. Notably, 12 persistent viruses (7 DNA and 5 RNA viruses) were exclusive to dwarfed chunkung plants. The detection of viruses from the families Partitiviridae, Picobirnaviridae, and Spinareoviridae only in the dwarfed plants suggested that they may contribute to the observed dwarfism. The co-infection of chunkung by multiple viruses is indicative of a dynamic and interactive viral ecosystem with significant sequence variability and evidence of recombination. CONCLUSIONS: We revealed the viral community involved in chunkung. Our findings suggest that chunkung serves as a significant reservoir for a variety of plant viruses. Moreover, the co-infection rate of individual plants was unexpectedly high. Future research will need to elucidate the mechanisms enabling several dozen viruses to co-exist in chunkung. Nevertheless, the important insights into the chunkung virome generated in this study may be relevant to developing effective plant viral disease management and control strategies.


Assuntos
Coinfecção , Nanismo , Vírus de Plantas , Vírus de RNA , Humanos , Viroma , Ecossistema , Cnidium/genética , RNA Viral/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Vírus de Plantas/genética , DNA , Filogenia
19.
Gigascience ; 132024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38626722

RESUMO

BACKGROUND: Most currently available reference genomes lack the sequence map of sex-limited (such as Y and W) chromosomes, which results in incomplete assemblies that hinder further research on sex chromosomes. Recent advancements in long-read sequencing and population sequencing have provided the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. FINDINGS: We introduce the first computational method, Sorting long Reads of Y or other sex-limited chromosome (SRY), which achieves improved assembly results compared to flow sorting. Specifically, SRY outperforms in the heterochromatic region and demonstrates comparable performance in other regions. Furthermore, SRY enhances the capabilities of the hybrid assembly software, resulting in improved continuity and accuracy. CONCLUSIONS: Our method enables true complete genome assembly and facilitates downstream research of sex-limited chromosomes.


Assuntos
Genoma , Cromossomos Sexuais , Cromossomos Sexuais/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
20.
BMC Bioinformatics ; 25(Suppl 1): 153, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627615

RESUMO

BACKGROUND: With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS: We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION: This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.


Assuntos
Metagenômica , Viverridae , Animais , Metagenômica/métodos , Redes Neurais de Computação , Metagenoma , Aprendizado de Máquina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...